The Commutativity Problem of the MapReduce Framework: A Transducer-Based Approach

نویسندگان

  • Yu-Fang Chen
  • Lei Song
  • Zhilin Wu
چکیده

MapReduce is a popular programming model for data parallel computation. In MapReduce, the reducer produces an output from a list of inputs. Due to the scheduling policy of the platform, the inputs may arrive at the reducers in different order. The commutativity problem of reducers asks if the output of a reducer is independent of the order of its inputs. Although the problem is undecidable in general, the MapReduce programs in practice are usually used for data analytics and thus require very simple control flow. By exploiting the simplicity, we propose a programming language for reducers where the commutativity problem is decidable. The main idea of the reducer language is to separate the control and data flow of programs and disallow arithmetic operations in the control flow. The decision procedure for the commutativity problem is obtained through a reduction to the equivalence problem of streaming numerical transducers (SNTs), a novel automata model over infinite alphabets introduced in this paper. The design of SNTs is inspired by streaming transducers (Alur and Cerny, POPL 2011). Nevertheless, the two models are intrinsically different since the outputs of SNTs are integers while those of streaming transducers are data words. The decidability of the equivalence of SNTs is achieved with an involved combinatorial analysis of the evolvement of the values of the integer variables during the runs of SNTs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

A discrete-event optimization framework for mixed-speed train timetabling problem

Railway scheduling is a complex task of rail operators that involves the generation of a conflict-free train timetable. This paper presents a discrete-event simulation-based optimization approach for solving the train timetabling problem to minimize total weighted unplanned stop time in a hybrid single and double track railway networks. The designed simulation model is used as a platform for ge...

متن کامل

A Two-Phase Simulation-Based Optimization of Hauling System in Open-Pit Mine

One of the key issues in mining is the hauling system. Truck and shovels are the most widely used transportation equipment in mines. In this paper, a two-phase simulation-based optimization is presented to maximize utilization of hauling system in the largest Iranian open-pit copper mine. In the first phase, The OptQuest for Arena software package was used to solve the optimization problem to p...

متن کامل

Classifier Ensemble Framework: a Diversity Based Approach

Pattern recognition systems are widely used in a host of different fields. Due to some reasons such as lack of knowledge about a method based on which the best classifier is detected for any arbitrary problem, and thanks to significant improvement in accuracy, researchers turn to ensemble methods in almost every task of pattern recognition. Classification as a major task in pattern recognition,...

متن کامل

Multi-period and multi-resource operating room scheduling and rescheduling using a rolling horizon approach: a case study

In this paper, a multi-period and multi-resource operating room (OR) scheduling and rescheduling problem with elective and semi-elective (semi-urgent) patients is addressed. A scheduling-rescheduling framework based on the so-called rolling horizon approach is proposed to solve the problem. The core of the proposed framework is a novel proposed mixed-integer linear programming (MILP) model with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016